Exploiting Data Lineage for Parallel Optimization in Extensible DBMSs

نویسندگان

  • Eddie C. Shek
  • Richard R. Muntz
چکیده

Extensibility and high query performance are important requirement of advanced large scale information systems since complex data analysis often requires the use of application-speci c operations that have to be introduced by the user issuing the query. One of the decisions to make when parallelizing a query execution plan is to determine how input data streams to evaluators implementing logical operations can be divided to be processed by clones of the same evaluator in parallel. Parallelization options for queries in a database systems with a static data model are generally hardcoded and embedded in the parallel query optimizer. On the other hand, the characteristics of user-de ned evaluators that dictate how it can be parallelized have to be systematically captured as evaluators are introduced to allow automatic parallelization. Towards the goal of supporting automatic parallelization of queries containing complex user-de ned evaluators in an extensible DBMS, we devised a \relevance window" model to capture the inherent data lineage [2] characteristics of evaluators on multidimensional data sets. Informally, the relevance window of an evaluator de nes the scope of in uence input data records have on the value of records in the output data space. An evaluator's relevance window constrains the data partitioning opportunities available for an evaluator. Speci cally, it determines the amount of overlapping input data stream partitions must have if they are to be evaluated in parallel and independently by clone of the evaluator. As a basis for extensible query processing, we introduced two higher-order computation patterns. Sequence derivation captures the powerful iterative computation pattern that allows a sequence to be effectively derived from an input sequence, while the window aggregation framework generalizes the computing structure of stencil algorithms to support the composition of aggregation evaluators against multidimensional data sets. We identi ed a number of classes of special relevance window characteristics for these computation patterns that can be exploited for query parallelization. One of the most interesting classes of relevance windows is sliding windows. Consider an evaluator that tracks the n-day running average of a parameter. The parameter value for each day in the time dimension in the input in uences the the average value for the next n days. We say that the evaluator has a n-day sliding relevance window in the time dimension since the input records on which each output record is dependent on fall under a xed n-day range in the time dimension and are at the same relative location to the output. To parallelize an evaluator with a sliding relevance window in a dimension, we can partition the input data streams into substreams in the dimension with an amount of overlap equal to the size of the relevance window, each of which is assigned to be processed by an evaluator clone. The parallel query evaluation model described has been implemented in Conquest [1], an extensible parallel geoscienti c query processing system. Conquest implements a query parallelization framework that extends relational parallel query optimization algorithms to allow the parallelization characteristics of userde ned evaluators to guide the process of query parallelization in an extensible parallel query processing environment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting the Functionality of Object-Oriented Database Management Systems for Information Retrieval

In this paper, we present the approach of FIRE to utilizing an object-oriented Database Management System (DBMS) for Information Retrieval (IR) purposes. First, a comprehensive overview of previous attempts to use DBMSs for implementing IR systems is given. Next, di erences between DBMSs and IR systems, with regard to indexing and retrieval, are discussed. In addition, some shortcomings of DBMS...

متن کامل

EROC: A Toolkit for Building NEATO Query Optimizers

EROC (Extensible, Reusable Optimization Components) is a toolkit for building query optimizers. EROC’s components are C++ classes based on abstractions we have identified as central to query optimization, not only in relational DBMSs, but in extended relational and object-oriented DBMSs as well. I EROC’s use of C++ classes clarifies the mapping from application domain (optimization) abstraction...

متن کامل

Exploiting Reconfigurable FPGA for Parallel Query Processing in Computation Intensive Data Mining Applications

This work concentrates on exploiting re-configurable Field Programmable Gate Arrays (FPGAs), an SRAM-based FPGA coprocessor, for query processing in computation-intensive data mining applications. Complex computation-intensive data mining applications in geoscientific and medical information systems environments often require support for extensibility and parallel processing to deliver the nece...

متن کامل

Query Processing in Object-Oriented Database Systems

One of the basic functionalities of database management systems (DBMSs) is to be able to process declarative user queries. The first generation of object-oriented DBMSs did not provide declarative query capabilities. However, the last decade has seen significant research in defining query models (including calculi, algebra and user languages) and in techniques for processing and optimizing them...

متن کامل

A Cost Model for Parallel Navigational Access in Complex-Object DBMSs

In contrast to relational query processing, one of the most important extensions of query processing in object-oriented DBMSs (OODBMSs) is navigational access to objects. So far, optimization of this kind of access has been primarily supported by special access path structures or object clustering strategies. Parallelism, although an important topic in large relational systems, has not been exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999